Back

Journal of Medical Internet Research

81 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
Can AI Match Human Experts? Evaluating LLM-Generated Feedback on Resident Scholarly Projects
2026-03-04 medical education 10.64898/2026.03.04.26346878
Top 0.7% (9.9%)
Show abstract

BackgroundDelivering timely, high-quality feedback on resident scholarly projects is labour-intensive, especially in large programmes. We developed an AI-assisted evaluation system, powered by the open-weight LLaMA-3.1 large-language model (LLM), to generate formative feedback on Family Medicine residents scholarly projects and compared its performance with expert human evaluators. MethodsWe evaluated whether the AI-generated feedback achieves comparable quality to expert feedback. The tool ing...

2
Large language models for self-administered conversational vignette assessment of provider competencies: A pilot and validation study in Vietnam with automated LLM-powered transcript classification
2026-03-04 health economics 10.64898/2026.03.02.26347479
Top 0.7% (9.8%)
Show abstract

We developed and validated a self-administered clinical vignette platform powered by a large language model (LLM), deployed through a SurveyCTO web survey, to measure primary health care provider competencies in Vietnam. In a pilot focus group, nine physicians rated LLM-simulated patient interactions as realistic (mean 3.78/5) and user-friendly. In the validation phase, 22 providers completed 132 vignette interactions across ten clinical scenarios in Vietnamese. Essential diagnostic checklist sc...

3
Personalized Insights Derived from Wearable Device Data and Large Language Models to Improve Well-Being
2026-03-04 health informatics 10.64898/2026.03.03.26347299
Top 0.9% (9.0%)
Show abstract

Health behaviors such as physical activity and sleep affect mental health, but the effect of each health behavior varies substantially across individuals, limiting the usefulness of generic behavioral recommendations. We collected one year of continuous wearable and ecological momentary assessment data from 3,139 participants in the Intern Health Study (2018-2023), and examined individual-level associations between wearable-derived features and mood across the internship year. The behaviors asso...

4
Perceptions of Artificial Intelligence in the Editorial and Peer Review Process: A Cross-Sectional Survey of Traditional, Complementary, and Integrative Medicine Journal Editors
2026-03-04 health informatics 10.64898/2026.03.04.26347571
Top 1% (7.6%)
Show abstract

BackgroundArtificial intelligence chatbots (AICs) are increasingly being integrated into scholarly publishing, with the potential to automate routine editorial tasks and streamline workflows. In traditional, complementary, and integrative medicine (TCIM) publishing, editorial and peer review processes can be particularly complex due to diverse methodologies and culturally embedded knowledge systems, presenting unique opportunities and challenges for AIC adoption. MethodsAn anonymous, online cro...

5
No evidence of increased gaming-related problems with long-term use of a video game therapeutic: Exploratory endpoint findings from a randomized controlled trial
2026-03-05 psychiatry and clinical psychology 10.64898/2026.03.04.26347656
Top 2% (7.5%)
Show abstract

Digital therapeutics for mental health often face low patient engagement, which limits their clinical impact. Interventions that deliver treatment using a video game medium may improve engagement and therapeutic efficacy, but the putative emergence of gaming-related problems remains a concern among clinical stakeholders. We examined whether long-term engagement with Meliora, a video game therapeutic for adult major depressive disorder, was associated with changes in gaming-related problems in a ...

6
Population differences in wearable device wear time: Rescuing data to address biases and advance health equity
2026-03-06 health informatics 10.64898/2026.03.06.26347799
Top 2% (6.8%)
Show abstract

Wearable devices present transformative opportunities for personalized healthcare through continuous monitoring of digital biomarkers; however, individual variations in device wear time could mask or otherwise impact signal identification. Despite the widespread adoption of wearable devices in research, no comprehensive framework exists for understanding how wear time varies across populations or for addressing wear time-related biases in analysis. Using Fitbit data from 11,901 participants in t...

7
Evaluating a Locally Deployed 20-Billion Parameter Large Language Model for Automated Abstract Screening in Systematic Reviews
2026-03-04 health informatics 10.64898/2026.03.04.26347506
Top 2% (6.4%)
Show abstract

BackgroundSystematic reviews (SRs) are essential for evidence-based medicine but require extensive time and resources for abstract screening. Large language models (LLMs) offer potential for automating this process, yet concerns about data privacy, intellectual property protection, and reproducibility limit the use of cloud-based solutions in research settings. ObjectiveTo evaluate the performance of a locally deployed 20-billion parameter LLM for automated abstract screening in systematic revi...

8
Red-Teaming Medical AI: Systematic Adversarial Evaluation of LLM Safety Guardrails in Clinical Contexts
2026-03-05 health informatics 10.64898/2026.02.26.26347212
Top 3% (6.0%)
Show abstract

BackgroundLarge language models (LLMs) are increasingly deployed in medical contexts as patient-facing assistants, providing medication information, symptom triage, and health guidance. Understanding their robustness to adversarial inputs is critical for patient safety, as even a single safety failure can lead to adverse outcomes including severe harm or death. ObjectiveTo systematically evaluate the safety guardrails of state-of-the-art LLMs through adversarial red-teaming specifically designe...

9
Medical concept understanding in large language models is fragmented
2026-03-05 health informatics 10.64898/2026.03.03.26347552
Top 3% (5.9%)
Show abstract

Large language models (LLMs) perform strongly across a wide range of medical applications, yet it remains unclear whether such success reflects genuine understanding of medical concepts. We present an ontology-grounded, concept-centered evaluation of medical concept understanding in LLMs. Using 6,252 phenotype concepts from Human Phenotype Ontology, we decompose concept understanding into three core dimensions--concept identity, concept hierarchy, and concept meaning--and design corresponding be...

10
Class imbalance correction in artificial intelligence models leads to miscalibrated clinical predictions: a real-world evaluation
2026-03-05 health informatics 10.64898/2026.03.04.26347634
Top 3% (5.7%)
Show abstract

BackgroundPredictive models employing machine learning algorithms are increasingly being used in clinical decision making, and improperly calibrated models can result in systematic harm. We sought to investigate the impact of class imbalance correction, a commonly applied preprocessing step in machine learning model development, on calibration and modelled clinical decision making in a large real-world context. MethodsA histogram boosted gradient classifier was trained on a highly imbalanced na...

11
Show Your Work: Verbatim Evidence Requirements and Automated Assessment for Large Language Models in Biomedical Text Processing
2026-03-04 health informatics 10.64898/2026.03.03.26346690
Top 4% (5.4%)
Show abstract

PurposeLarge language models (LLMs) are used for biomedical text processing, but individual decisions are often hard to audit. We evaluated whether enforcing a mechanically checkable "show your work" quote affects accuracy, stability, and verifiability for trial eligibility-scope classification from abstracts. MethodsWe used 200 oncology randomized controlled trials (2005 - 2023) and provided models with only the title and abstract. Trials were labeled with whether they allowed for the inclusio...

12
Variability in Automated Sepsis Case Detection: A Systematic Analysis of Implementation Methods in Clinical Data Repositories
2026-03-04 health informatics 10.64898/2026.02.27.26347259
Top 4% (5.0%)
Show abstract

ObjectiveTo systematically identify and characterize methodological heterogeneity in sepsis case detection methods using the MIMIC-III database or the eICU-CRD, and to quantify the resulting variability in sepsis detection rates. Materials and MethodsWe conducted a PRISMA-guided systematic review of PubMed and Web of Science (2016-2024), and stratified studies by cohort definition to obtain comparable subsets. We extracted information on sepsis case detection methodology across six domains: par...

13
Thyroid Cancer Risk Prediction from Multimodal Datasets Using Large Language Model
2026-03-06 health informatics 10.64898/2026.03.05.26347766
Top 4% (4.9%)
Show abstract

Thyroid carcinoma is one of the most prevalent endocrine malignancies worldwide, and accurate preoperative differentiation between benign and malignant thyroid nodules remains clinically challenging. Diagnostic methods that medical practitioners use at present depend on their personal judgment to evaluate both imaging results and separate clinical tests, which creates inconsistency that leads to incorrect medical evaluations. The combination of radiological imaging with clinical information syst...

14
Student Scholarly Research Programs in US Medical Schools: Cross-sectional Web Audit
2026-03-04 medical education 10.64898/2026.03.03.26347497
Top 4% (4.6%)
Show abstract

BackgroundParticipating in research during medical school is supported by institutional programs and may influence subsequent professional development. ObjectiveWe aimed to describe the current status and heterogeneity of scholarly research programs for medical students in the United States, including expectations, support, and key structural features. MethodsWe conducted a cross-sectional web audit of official webpages for all accredited US MD- and DO-granting medical schools (search performe...

15
Cultryx: Precision Diagnostic Stewardship for Blood Cultures Using Machine Learning
2026-03-04 infectious diseases 10.64898/2026.02.27.26347214
Top 5% (3.8%)
Show abstract

BackgroundThe 2024 blood culture bottle shortage brought diagnostic resource allocation to the forefront, reflecting persistent, foundational challenges with low-value testing and empiric treatment approaches under clinical uncertainty. ObjectiveTo determine whether a machine learning approach using electronic medical record data can predict bacteremia more effectively than existing systems and practices to guide diagnostic testing and empiric treatment strategies. MethodsIn a retrospective co...

16
Trustworthy personalized treatment selection: causal effect-trees and calibration in perioperative medicine
2026-03-04 health informatics 10.64898/2026.03.03.26347440
Top 6% (3.1%)
Show abstract

BackgroundPersonalized medicine promises to tailor treatments to the individual, but it carries a hidden risk: mistaking statistical noise for actionable clinical insight. Current machine learning approaches often provide predictions, but fail to inform clinicians when those predictions are unreliable. ObjectiveDevelop a deployment-readiness framework that integrates causal inference, interpretable effect-trees, and calibration assessment to distinguish actionable signal from unreliable variati...

17
A Qualitative Study of Patient and Healthcare Provider Perspectives on Mobile Health Assessments for Cervical Spondylotic Myelopathy
2026-03-05 health informatics 10.64898/2026.03.04.26347622
Top 6% (2.9%)
Show abstract

Objective: Evaluating and monitoring patients with cervical spondylotic myelopathy (CSM) remains a challenge due to limited tools for assessing objective neurological disability longitudinally and in the home environment. Given their prevalence and low cost, mobile health (mHealth), and specifically smartphone technologies offer a promising approach to fill this gap. This study explored stakeholder perspectives on the role of mHealth in CSM monitoring to inform development of a smartphone-based ...

18
Enhancing competency in clinical trials management: Findings from a multicountry trial coordinators interventional training program
2026-03-04 medical education 10.64898/2026.03.03.26347517
Top 7% (2.3%)
Show abstract

BackgroundClinical research coordinators play a crucial role in ensuring the scientific rigor, regulatory compliance, and operational integrity of clinical trials. However, in Africa, they often lack access to structured, competency-based training, especially in operational, regulatory, and trial management domains. This study evaluated the effectiveness of a comprehensive training intervention designed to standardize and enhance core competencies of clinical trial coordinators. MethodsWe condu...

19
Effects of morning and evening narrowband blue light and myopic defocus on axial length in humans
2026-03-04 ophthalmology 10.64898/2026.03.03.26347502
Top 7% (2.2%)
Show abstract

PurposeTo investigate the effects of morning and evening narrowband blue light exposure on axial length, and to examine the short-term effect of morning blue light combined with myopic defocus on axial length. MethodsFor objective 1, 18 individuals underwent 60 minutes of narrowband blue light exposure (460nm) in the morning (9:00-11:00AM) and evening (5:00-7:00PM) of the same day. The axial length values were normalized to the average of the morning and evening axial length values. For objecti...

20
Evaluating Essential Coaching for Every Mother Tanzania (ECEM-TZ) as a postpartum text message digital health solution: A randomized controlled trial
2026-03-04 public and global health 10.64898/2026.03.03.26347504
Top 7% (2.2%)
Show abstract

BackgroundText messages are a low-cost digital health solution that can provide information directly to mothers. We aimed to evaluate a text message program, called Essential Coaching for Every Mother Tanzania (ECEM-TZ), designed to improve maternal access to essential newborn care education during the immediate 6-week postnatal period. MethodsA randomized controlled trial was conducted in Dar es Salaam, Tanzania. ECEM-TZ consists of standardized text messages from birth to 6 weeks postpartum t...